fix(vector_math): 손상 블롭 로깅(N6) + 엔디안 문서화(N5) (LOC-63)#66
Merged
Conversation
…(LOC-63) PR5 (hygiene), backend-agnostic. N6: HNSW index-build paths decoded embeddings with decode_f32_embedding(..).unwrap_or_default(), turning a corrupt blob into an empty Vec that the !is_empty() filter then dropped silently. Add decode_f32_embedding_or_warn(blob, row_id): logs a warning (with row id) on failure and returns an empty Vec, so behaviour is preserved (still dropped) but corruption is visible. Dedups the pattern across the 6 sites (simple_rag, source_rag). N5 (conservative): document the native-endian / little-endian storage assumption and the alignment reason zero-copy decode is unsound, on decode_f32_embedding. No format change (all targets are LE); full normalization + a shared encode helper left as backlog. Journal: PR5.md, README (PR3 보류 / PR4 폐기 / PR5 진행), risk-register R7+N6 closed, and RETRO.md (final retrospective).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
무엇 / 왜 (PR5, 위생 · 백엔드 무관)
N6 — 손상 블롭 무음 드롭 가시화
HNSW 인덱스 빌드 경로 6곳이
decode_f32_embedding(..).unwrap_or_default()로 손상 임베딩을 빈 벡터→!is_empty()필터에서 조용히 드롭.decode_f32_embedding_or_warn(blob, row_id)헬퍼 추가 — 실패 시log::warn!(row id 포함) 후 빈 Vec 반환(동작 보존). 6곳 패턴 dedup.N5 — 엔디안 (보수적: 문서화만)
임베딩은 native-endian 저장/읽기. 인코딩 사이트 5곳 분산 + 모든 타깃 LE라 실효 0 → 포맷 미변경,
decode_f32_embeddingdoc에 native-endian/LE 가정 + zero-copy 캐스팅 불가(정렬 미보장) 명시. 공유 encode 헬퍼는 backlog.검증 (로컬)
cargo test --lib vector_math3 green--features vector_faer4 green(패리티 포함)cargo check --features vector_faer,vector_quant_i8통과(quant 분기 헬퍼 호출 포함)비고
상세:
docs/perf/vector-math-refactor/PR5.md. 머지는 본인이 직접(CI green 후).